Search CORE

17 research outputs found

A test of independence in two-way contingency tables based on maximal correlation

Author: Rizzo M.L.
Székely G.J.
Yenigün C.D.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2011
Field of study

Maximal correlation has several desirable properties as a measure of dependence, including the fact that it vanishes if and only if the variables are independent. Except for a few special cases, it is hard to evaluate maximal correlation explicitly. We focus on two-dimensional contingency tables and discuss a procedure for estimating maximal correlation, which we use for constructing a test of independence. We compare the maximal correlation test with other tests of independence by Monte Carlo simulations. When the underlying continuous variables are dependent but uncorrelated, we point out some cases for which the new test is more powerful. © Taylor & Francis Group, LLC

Bilkent University Institutional Repository

Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms

Author: A.D. GORDON
Alberto Fernández
B.J.T. MORGAN
G. HART
G.J. SZÉKELY
G.N. LANCE
J. MACCUISH
J.H. WARD Jr.
P.H.A. SNEATH
R.M. CORMACK
Sergio Gómez
T. BACKELJAU
V. ARNAU
W.A. KLOOT VAN DER
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/06/2009
Field of study

In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance and Williams' formula which enables the implementation of the algorithm in a recursive way.Comment: Free Software for Agglomerative Hierarchical Clustering using Multidendrograms available at http://deim.urv.cat/~sgomez/multidendrograms.ph

arXiv.org e-Print Archive

Crossref

Research Papers in Economics

Reliability Maps:A Tool to Enhance Probability Estimates and Improve Classification Accuracy (Best paper award)

Author: A. Bella
A.C. Lorena
A.H. Murphy
B. Zadrozny
E. Allwein
G. Shafer
G.J. Székely
J. Fan
J.D. Zhou
M. Galar
P.N. Bennett
R.E. Schapire
T. Dietterich
T. Windeatt
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Explore Bristol Research

Evaluating observed versus predicted forest biomass: R-squared, index of agreement or maximal information coefficient?

Author: Burnham K.P.
Crookston N.L.
Eggleston H.S.
López-Moreno J.I.
Mallows C.L.
Montero G.
Næsset E.
R Development Core Team
Székely G.J.
Temesgen H.
Theil H.
Weisberg S.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2019
Field of study

The accurate prediction of forest above-ground biomass is nowadays key to implementing climate change mitigation policies, such as reducing emissions from deforestation and forest degradation. In this context, the coefficient of determination (

{R^2}

) is widely used as a means of evaluating the proportion of variance in the dependent variable explained by a model. However, the validity of

{R^2}

for comparing observed versus predicted values has been challenged in the presence of bias, for instance in remote sensing predictions of forest biomass. We tested suitable alternatives, e.g. the index of agreement (

d

) and the maximal information coefficient (

MIC

). Our results show that

d

renders systematically higher values than

{R^2}

, and may easily lead to regarding as reliable models which included an unrealistic amount of predictors. Results seemed better for

MIC

, although

MIC

favoured local clustering of predictions, whether or not they corresponded to the observations. Moreover,

{R^2}

was more sensitive to the use of cross-validation than

d

MIC

, and more robust against overfitted models. Therefore, we discourage the use of statistical measures alternative to

{R^2}

for evaluating model predictions versus observed values, at least in the context of assessing the reliability of modelled biomass predictions using remote sensing. For those who consider

d

to be conceptually superior to

{R^2}

, we suggest using its square

{d^2}

, in order to be more analogous to

{R^2}

and hence facilitate comparison across studies

Crossref

Directory of Open Access Journals

Bangor University Research Portal

Distribution of antioxidant components in roots of different red beets (Beta vulgaris L.) cultivars

Author: B. Szabó-Nótin
Benzie I.I.F.
Berrada M.
Castellar R.
D. Székely
Georgiev V.G.
J. Ivanics
J. Monspart-Sényi
Kahkonen M.P.
Kapadia G.J.
Kugler F.
Kujala T.S.
L. Szalóki-Dorkó
M. Stéger-Máté
Nagy-Gasztonyi M.
Nilsson T.
Ninfali P.
Pedreno M.A.
Singleton V.L.
Stinzing F.C.
Vinson J.A.
Publication venue: 'Akademiai Kiado Zrt.'
Publication date: 01/01/2014
Field of study

The beetroot is typically on the table in winter in form of pickles or juice, but for its nutritional values it would deserve more common consumption. Its curative effect in great part is due to the several vitamins, minerals, and compounds with antioxidant activity. But the division of biological active compounds is very different in the parts of the root. Based on our results, we could compare the differences between the morphology and some inner contents (soluble solid content, colour, betacyanin, betaxanthin, and polyphenol contents, antioxidant activity, and some flavonoids) of two beetroot cultivars. The results of the morphological investigations showed that the ‘Cylindre’ cultivar had more favourable crop parameters than the ‘Alto F1’ cultivar. In the ‘Cylindre’ cultivar the polyphenol content and the antioxidant capacity were significantly higher than in the ‘Alto F1’ cultivar. By determination of the betanin contents of the investigated beetroots, our results showed both betacyanin and betaxanthin contents were higher in the ‘Cylindre’ cultivar. The chlorogenic acid, gallic acid, the cumaric acid have been identified based on the peaks of HPLC in the studied beetroot cultivars

Crossref

Repository of the Academy's Library

Distribution of antioxidant components in roots of different red beets ( Beta vulgaris

Author: B. Szabó-Nótin
Benzie I.I.F.
Berrada M.
Castellar R.
D. Székely
Georgiev V.G.
J. Ivanics
J. Monspart-Sényi
Kahkonen M.P.
Kapadia G.J.
Kugler F.
Kujala T.S.
L. Szalóki-Dorkó
M. Stéger-Máté
Nagy-Gasztonyi M.
Nilsson T.
Ninfali P.
Pedreno M.A.
Singleton V.L.
Stinzing F.C.
Vinson J.A.
Publication venue: 'Akademiai Kiado Zrt.'
Publication date
Field of study

Crossref

Elementary symmetric polynomials of increasing order

Author: A.J. Es van
G. Halász
G.J. Székely
H. Callaert
P.J. Bickel
R. Helmers
S. Karlin
T.F. Móri
W. Feller
W. Feller
W. Hoeffding
W.R. Zwet van
Y. Maesono
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Piecewise Approximate Bayesian Computation: fast inference for discretely observed Markov models using a factorised posterior distribution

Author: A. Golightly
B.W. Silverman
C. Andrieu
D. Moriña
D.J. Wilkinson
D.J. Wilkinson
E. Gabriel
E. McKenzie
E.B. Sudderth
G.B. Durham
G.J. Székely
J.C. Cox
J.K. Pritchard
J.M. Marin
K. Fukunaga
K.V. Mardia
M.A. Al-Osh
M.A. Beaumont
M.G.B. Blum
N.G. Kampen Van
P. Fearnhead
P. Marjoram
P. Neal
P. Wand
R.D. Wilkinson
R.J. Boys
S. P. Preston
S. R. White
T. Kypraios
T. McKinley
T. Toni
T.P. Minka
Y. Aït-Sahalia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Many modern statistical applications involve inference for complicated stochastic models for which the likelihood function is difficult or even impossible to calculate, and hence conventional likelihood-based inferential techniques cannot be used. In such settings, Bayesian inference can be performed using Approximate Bayesian Computation (ABC). However, in spite of many recent developments to ABC methodology, in many applications the computational cost of ABC necessitates the choice of summary statistics and tolerances that can potentially severely bias the estimate of the posterior. We propose a new “piecewise” ABC approach suitable for discretely observed Markov models that involves writing the posterior density of the parameters as a product of factors, each a function of only a subset of the data, and then using ABC within each factor. The approach has the advantage of side-stepping the need to choose a summary statistic and it enables a stringent tolerance to be set, making the posterior “less approximate”. We investigate two methods for estimating the posterior density based on ABC samples for each of the factors: the first is to use a Gaussian approximation for each factor, and the second is to use a kernel density estimate. Both methods have their merits. The Gaussian approximation is simple, fast, and probably adequate for many applications. On the other hand, using instead a kernel density estimate has the benefit of consistently estimating the true piecewise ABC posterior as the number of ABC samples tends to infinity. We illustrate the piecewise ABC approach with four examples; in each case, the approach offers fast and accurate inference

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Springer - Publisher Connector

PubMed Central